\section{Rascal, a DSL for meta-programming}

%\def\rascal{\textsc{Rascal}\xspace}
\def\oberon{\textsc{Oberon-0}\xspace}

% todo: use rascal.sty
\def\rcode#1{\texttt{#1}}

%\subsection{Introduction}

\noindent \Rascal is a domain-specific language for
meta-programming~\cite{rascal,RascalLWC11}. It has been designed to deal with software languages
in the broadest sense of the word. Its application areas include the
development of DSLs, reverse engineering and reengineering of (legacy)
software systems and software repository mining. \Rascal features
powerful domain-specific language constructs to make meta programming
more effective:
\begin{itemize}
\item Integrated syntax definition: context-free grammars can be
  declared as part of ordinary \Rascal code; the concrete syntax trees defined
  by such grammars are first-class values, and non-terminals are
  first-class types.
\item Built-in data types: apart from the standard data types (int,
  bool, string etc.), \Rascal supports sets, relations, maps, lists,
  algebraic data types (ADTs), concrete syntax trees, and source
  locations. Every value in Rascal can be used in pattern
  matching. This includes concrete syntax trees produced by \Rascal's
  grammar.
\item Constructs for analysis and transformation: sets, relations,
  maps and lists can be used in comprehensions. The visit-statement
  allows for strategic traversal and rewriting of abstract or
  concrete syntax trees.
\end{itemize}
We have aimed to make Rascal easy to learn and debug by adhering to well-known C/Java/C\#-like syntax and familiar programming idioms (mostly standard and explicit control-flow, functional programming, and pattern-matching).

\subsection{Overview of the \Rascal \oberon implementation}

\noindent \Rascal allows computer languages to be implemented in a modular
fashion, through the simultaneous extension of syntax definition
(grammars), algebraic data types (ADTs) and functions operating on
data conforming to these data types. The \oberon implementation
consists of four layers corresponding to each language level. An
overview of the architecture is shown in Table~\ref{TBL:rascalArch}.

\begin{table}
\begin{center}\small
\begin{tabular}{|l||l|l|l|}\hline
   &  Syntax & Data Type & Function \\\hline\hline
L1 &  \textbf{grammar} & \textbf{AST}, \textbf{scope} & 
\textbf{bind}, \textbf{check}, \textbf{format}, \textbf{compile} \\\hline 
L2 & grammar & AST & bind, check, format, \textbf{desugar} \\\hline
L3 & grammar & AST, scope & bind, check, format, compile, \textbf{lift} \\\hline
L4 & grammar & AST & bind, check, format, compile\\\hline%\hline
%\multicolumn{4}{|r|}{\textit{Total:}} & 1394 \\\hline
\end{tabular}
\end{center}
\caption{Layered architecture of the \Rascal \oberon implementation;
  up-right font indicates extension, \textbf{bold} indicates introduction.\label{TBL:rascalArch}}
\end{table}

The table show four columns, corresponding to syntax definition,
(algebraic) data types and functions; the last column shows the number of source-lines of code (SLOC) required to implement the component on a row.  Each row corresponds to a
language level. The function \irascal{bind} implements name analysis, \irascal{check} implements type checking, \irascal{format} implements pretty printing and \irascal{compile} realizes code generation to C. Furthremore, the auxiliar functions \irascal{desugar} and \irascal{lift} represent semantics preserving source-to-source transformations on \oberon to deal with the L2 constructs and to simplify compilation respectively. 

Level L1 represents the base case: all relevant
syntax, data types and functions are introduced here. In the
subsequent levels any of these types and functions may have to be
extended. For instance, in L2, where the FOR and CASE statements are
introduced, the grammar and AST definition of L1 are extended with new
alternatives to deal with these constructs. Similarly, the functions
bind, check and format are extended. However, the compile function is
not extended; instead FOR and CASE statements are desugared to L1
constructs. Note also that the scope data type used during name
analysis is reused as is. This data type only needs to be extended
when procedures are introduced in L3. In L3 we introduce a lift
function to move nested procedures to top level.

Extension in \Rascal works through the \texttt{extend} construct. This
constructs works similar to module import, except that it allows the
extension of recursively defined abstractions, in this case ADTs and
functions. This leads to a mechanism similar to inheritance in
object-oriented languages. For grammars and ADTs the
definitions of the extended module and the extending module are simply
merged. Function extension, on the other hand, depends on a particular
style of writing \Rascal functions, in a way similar to term rewriting
rules. Consider, for instance, the
following rule implementing the \texttt{check} function for
type-checking WHILE statements:
\begin{quote}
\begin{rascal}
set[Message] check(whileDo(c, b)) =
                 checkCond(c) + checkBody(b);
\end{rascal}
\end{quote}
Normal Rascal functions have a body consisting of zero or more statements  enclosed in curly braces. However, if the result is a single expression, a function can be written in this shorter notation. The function uses pattern
matching in its signature to dispatch on the WHILE AST constructor;
this style of function definition hence is called
\textit{pattern-based dispatch}. The complete implementation of the
\irascal{check} function consists of multiple such rules, one for each
AST constructor. Such rules may be distributed over different modules
to obtain extensibility. For instance, in L3 where procedure
definitions and procedure calls are introduced, the \irascal{check}
function is extended by adding an implementation rule dispatching on
the AST constructor for procedure calls. In the \oberon implementation
\irascal{bind}, \irascal{check}, \irascal{format}, and \irascal{compile}
are all modularized using pattern-based dispatch.

For syntax and abstract syntax the mechanism is similar. In higher
layers (i.e. L2, L3, and L4), extension modules add alternatives to
both grammars and ADTs. In L3 this mechanism is also used to extend
the symbol table data type to deal with nested scopes. 

\subsection{Scanning and parsing}

\noindent \Rascal's syntax definitions are backed by a scannerless variant of the GLL parsing algorithm~\cite{GLL}. Scannerless
parsing solves the problem of modular lexical syntax. To deal with
(lexical) ambiguities, \Rascal has built-in support for longest-match
and keyword reservation. Consequently, each language level may
introduce additional reserved keywords, that are not reserved in
previous levels. For instance, the ``FOR'' keyword, introduced in L2,
is not reserved in L1.

Grammars are annotated with constructor names to obtain automatic
mapping of concrete syntax trees to ASTs. The \Rascal standard library function,
\texttt{implode}, converts a concrete syntax tree into an AST, given
an ADT definition of the abstract syntax. Implode uses the
\textit{reified ADT} as a recipe. Reified types are reflective value
representations of \Rascal types. The information in the reified type
is used by implode to decide how a concrete tree must be mapped to an
AST. For instance, some lexical concrete syntax nodes are mapped to
(native) integers if this is dictated by the abstract syntax ADT.

The statement syntax of \oberon L1 is defined using the grammar shown in Listing~\ref{LST:rascal-concrete-Statement}.
\begin{listing}
\begin{quote}
\begin{rascal}
syntax Statement 
  = assign: Ident ":=" Expression
  | ifThen: "IF" Expression "THEN" 
               {Statement ";"}+  ElsIfPart* ElsePart? "END"
  | whileDo: "WHILE" Expression "DO" {Statement ";"}+ "END"
  | skip: ;
\end{rascal}
\end{quote}
\caption{Concrete syntax definition of \oberon statements in L1}
\label{LST:rascal-concrete-Statement}
\end{listing}
Each alternative is labeled with a constructor name that corresponds
to a constructor in the abstract syntax, which (for statements) is
defined as shown in Listing~\ref{LST:rascal-abstract-Statement}.
\begin{listing}
\begin{quote}
\begin{rascal}
data Statement 
  = assign(Ident var, Expression exp)
  | ifThen(Expression condition, list[Statement] body, 
       list[ElsIf] elseIfs, list[Statement] elsePart)
  | whileDo(Expression condition, list[Statement] body)
  | skip()
  ;
\end{rascal}
\end{quote}
\caption{Abstract syntax definition of \oberon statements in L1}
\label{LST:rascal-abstract-Statement}
\end{listing}
In the abstract syntax ordinary \Rascal lists are used for regular
operators.  The \Rascal parser annotates concrete syntax trees with source
locations (a native datatype in Rascal). The implode function propagates these
locations to the AST so that meaningful error messages can be given
during later processing.

Scannerless means that there is no separate tokenization phase: both
lexical and context-free syntax are described using the same grammar
formalism. This makes it, for instance, trivially easy to support
\oberon's nested comments, the grammar for which is shown in Listing~\ref{LST:rascal-Comment}.
\begin{listing}
\begin{quote}
\begin{rascal}
lexical Comment = "(*" CommentElt* "*)" ;
lexical CommentElt = CommentChar+ | Comment ;
lexical CommentChar = ![*(] | [*] !>> [)] | [(] !>> [*];
\end{rascal}
\end{quote}
\caption{Grammar describing \oberon's nested comments}
\label{LST:rascal-Comment}
\end{listing}
In essence, this is just context-free syntax, except the
\texttt{lexical} keyword indicates that no layout is allowed between
symbols. The \irascal{CommentChar} non-terminal captures all characters except
``*'' and ``('', and uses follow restrictions (``\texttt{!>>}'') to
allow both ``*'' and ``('' if they are not followed by ``)'' and ``*''
respectively.

\subsection{Formatting}

\noindent Formatting in \Rascal consists of mapping (abstract) syntax
trees to Box constructs~\cite{BV96}. The resulting Box expressions
describes how elements should be layed out, e.g., horizontally,
vertically, indented or aligned, which font-style to use, and how
adjacent elements should be spaced relative to one another.

As an example, consider the formatting of the WHILE statement in Listing~\ref{LST:rascal-stat2box-while}.
\begin{listing}
\begin{quote}
\begin{rascal}
Box stat2box(whileDo(c, b)) = V([
   H([KW(L("WHILE")), exp2box(c), KW(L("DO"))])[@hs=1],
      I([V(hsepList(b, ";", stat2box))]),
   KW(L("END"))
]);  
\end{rascal} 
\end{quote}
\caption{Formatting WHILE-statements}
\label{LST:rascal-stat2box-while}
\end{listing}
Again this function uses pattern-based dispatch to match the AST
constructor for WHILE statements. The result of the function is a Box
expression (which is just an ADT in \Rascal). The result dictates that
the WHILE and DO are both keywords (KW), and the condition \irascal{c}
is formatted using the function \irascal{exp2box}. All three sub-boxes
should be laid out horizontally (H). Finally, the spacing between the
elements should be 1, as indicated by the annotation \texttt{@hs=1}.
The body \irascal{b} of the WHILE statement should be indented one
level (I), and the statements should be placed vertically. The helper
function \irascal{hsepList} is used to correctly combine separated
lists where the separator (``;'') should be horizontally combined with
an element; it is parameterized by a function to convert each
element of the list to a Box expression (i.e. \irascal{stat2box}).
Finally, the three sub-boxes are wrapped in a V box so that the
header, body and END keyword are placed vertically.

A common problem with pretty-printing is the (re)insertion of
parentheses in binary expressions according to the precedence and
associativity rules of the grammar. In \Rascal this can be solved
using reified types. Above, we described how \irascal{implode} used
the reified type of the abstract syntax ADT to guide the mapping of
concrete syntax trees to ASTs. In this case we proceed the other way around: we
use the reified type of the \textit{grammar} to obtain precedence and
associativity information. Since all non-terminals are first-class
types in \Rascal, obtaining the reified type of a non-terminal gives
us the complete grammar. 

To illustrate how this works, consider the following formatting rule
for multiplication expressions:
\begin{quote}
\begin{rascal}
Box exp2box(p:mul(lhs, rhs)) = 
    H([exp2box(p, lhs), L("*"), exp2box(p, rhs)])[@hs=1];
\end{rascal}
\end{quote}
This rule lays out expressions horizontally. But instead of calling
directly the unary function \irascal{exp2box} on \irascal{lhs} and
\irascal{rhs}, there is an intermediate call to \irascal{exp2box} with
two arguments; in both cases the first argument is \irascal{p} which
is bound to the current expression using \texttt{p:}. This parent
expression is used to decide whether to insert parentheses or not:
\begin{quote}
\begin{rascal}
Box exp2box(Expression parent, Expression kid) =
    parens(PRIOS, parent, kid, exp2box(kid), parenizer);

Box parenizer(Box box) = H([L("("), box, L(")")])[@hs=0];
\end{rascal}
\end{quote}
In this example, the global constant \irascal{PRIOS} contains
the precedence information obtained from the grammar. The
\irascal{parens} function checks if the occurence of \irascal{kid}
directly below \irascal{parent} requires parentheses, and if so, calls
\irascal{parenizer} on the Box expression resulting from
\irascal{exp2box(kid}). The function \irascal{parenizer} simply
horizontally wraps the argument with parentheses.


\subsection{Name analysis}

\noindent Name analysis consists of a traversal of the AST that does
two things at the same time:
\begin{enumerate}
\item Annotate use sites of variables, types and constants with
  their declarations.
\item Maintain a set of error messages, indicating problems such as
  undeclared identifiers. 
\end{enumerate}
These two concerns are implemented in functions \irascal{bindModule},
\irascal{bindStat}, \irascal{bindExp} etc. They carry around a
\irascal{NEnv} environment which contains the names that are currently
in scope (cf. above for its definition). If an identifier is
encountered, it is looked up in the environment and, if found, the
identifier is annotated using a \Rascal annotation representing the
declaration.

In order to both annotate ASTs and return a set of error messages the
\irascal{bind} family of functions returns a tuple containing an AST
node and a set of error messsages. This type is defined as the
following \Rascal alias:
\begin{quote}
\begin{rascal}
alias Bound[&T] = tuple[&T ast, set[Message] errs];
\end{rascal}
\end{quote}

As an example, consider the binding of the IF-statement, 
shown in Listing~\ref{LST:rascal-bindStat}.
\begin{listing}
\begin{quote}\small
\begin{rascal}
Bound[Statement] bindStat(s:ifThen(c, b, eis, e), 
                     NEnv nenv, set[Message] errs) {
    <s.condition, errs> = bindExp(c, nenv, errs);
    <s.body, errs> = bindStats(b, nenv, errs);
    s.elseIfs = for (ei <- eis) {
      <ei.condition, errs> = bindExp(ei.condition, nenv, errs);
      <ei.body, errs> = bindStats(ei.body, nenv, errs);
      append ei;
    }
    <s.elsePart, errs> = bindStats(e, nenv, errs);
    return <s, errs>;
}
\end{rascal}
\end{quote}
\caption{Binding analysis of the \oberon IF-statement}
\label{LST:rascal-bindStat}
\end{listing}
The function heavily uses \Rascal's destructuring assignment to
simultaneously replace parts of the AST and update the \irascal{errs}
variable. For instance, the first statements invokes \irascal{bindExp}
to bind the condition of the IF-statement; the resulting annotated
expression is inserted in place of the original condition
(\irascal{s.condition}). The reason for this idiom is that it is
impossible to simply rebuild the tree, because then the source
location annotations would be lost. The for-loop folds over the list
of ELSIF's while at the same time (possibly) updating the set
\irascal{errs}. The final result is the annotated IF-node
(\irascal{s}) together with the \irascal{errs}.

The annotation of identifiers is implemented using the
function in Listing~\ref{LST:rascal-bindId}.
\begin{listing}
\begin{quote}\small
\begin{rascal}
Bound[Ident] bindId(Ident x, NEnv nenv, set[Message] errs) {
  if (isVisible(nenv, x))
    return <x[@decl=getDef(nenv, x)], errs>;  
  if (x.name in {"TRUE", "FALSE"})
    return <x[@decl=trueOrFalse(x.name == "TRUE")], errs>;
  return <x, errs + { undefIdErr(x@location) }>;
}
\end{rascal}
\end{quote}
\caption{Binding analysis for identifiers}
\label{LST:rascal-bindId}
\end{listing}
This function checks the name environment (\irascal{nenv}) if the
identifier \irascal{x} is currently visible. If so, then the
\irascal{x} is annotated with its definition
(\texttt{x[@decl=...]}). Otherwise, if \irascal{x} represents any of the
boolean constants TRUE or FALSE, it is annotated accordingly. If none
of the above holds, the identifier is undeclared and an error is
produced, containing the source location of \irascal{x}.

\subsection{Type checking}

\noindent Similar to the name-analysis, type checking is an abstract
interpretation of the AST, computing a set of error messages. Unlike
in the case of name-analysis, however, the AST is not annotated with
further information; all required annotations are assumed to be set
during name analysis. This simplifies type checking considerably,
since all required information is local to a certain AST node.

As an example, consider the code to check procedure calls in Listing~\ref{LST:rascal-check}.
\begin{listing}
\begin{quote}
\begin{rascal}
set[Message] check(s:call(f, as)) {
  if (!(f@decl is proc)) 
    return { notAProcErr(f@location) };
  if (size(as) != arity(f@decl.formals)) 
    return { argNumErr(s@location) };
  errs = {}; i = 0;
  for (frm <- f@decl.formals, n <- frm.names) {
    errs += checkFormal(n, as[i], frm.hasVar);
    i += 1;
  }
  return ( errs | it + check(a) | a <- as);
}
\end{rascal}
\end{quote}
\caption{Checking \oberon procedure calls}
\label{LST:rascal-check}
\end{listing}
First the declaration of the called procedure is retrieved in order to
check whether \irascal{f} actually represents a procedure. If not, an
error is returned. Second, the list of formal parameters is obtained from
the \irascal{decl} annotations and the arity of the procedure is
checked. Third, now we can assume that the number of actual parameters
is correct, each argument is checked against its corresponding formal
parameter using \irascal{checkFormal}. Finally each actual parameter
is checked individually for type errors. The result of this function
is the union of the set of errors collected in the for-loop and the
union of all sets of errors returned for each actual argument.

\subsection{Source-to-source transformation}

\noindent \Rascal features builtin support for structure-shy traversal
of data structures using the \rcode{visit} statement. Visit works like
a traditional case statement, only cases are matched at arbitrary
depth in a data structure. There are 6 builtin strategies to control
the traversal order. Visited nodes maybe replaced, thereby rewriting
the tree, similar to traversal rules \textsc{Asf+Sdf}~\cite{Traversals} and rewrite
strategies in \cite{Visser04}.


The desugaring of FOR-loops and CASE-statements both heavily depend on
the visit-statement. For instance, the desugaring of the
for-statements introduced in L2 is shown in
Listing~\ref{LST:for2while}. The function \irascal{for2while} visits
a list of statements. First FOR-loops without a BY-clause are
normalized to have a BY-clause of 1 (\irascal{nat(1)}). Second, if a
BY-clause uses a symbolic constant, it is replaced with the integer
the constant evaluates to. Then, in the third and fourth case,
FOR-loops are substituted for equivalent L1 \oberon nodes. Since a
single FOR-loop is desugared to a sequence of statements, we use a
temporary AST constructor, \irascal{begin}, to allow inserting
multiple statements in place of one. The \irascal{begin} nodes are
spliced into their surrounding context in a later phase. 
%Note that the
%use of the \rcode{innermost} keyword ensures that FOR-loops are first
%normalized and only after that will they be eventually desugared.

\begin{listing}[t]
\begin{quote}
\begin{rascal}
public list[Statement] for2while(list[Statement] stats) {
 return innermost visit (stats) {
  case forDo(n, f, t, [], b) => forDo(n, f, t, [nat(1)], b)
  case forDo(n, f, t, [lookup(x)], b) => forDo(n, f, t, [c], b)
           when x@decl is const, c:nat(_) := eval(x)
  case forDo(n, f, t, [nat(x)], b) => 
           begin([assign(n, f), whileDo(leq(lookup(n), t), 
              [b, assign(n, add(lookup(n), nat(x)))])]) 
  case forDo(n, f, t, [neg(nat(x))], b) => 
           begin([assign(n, f), whileDo(geq(lookup(n), t), 
              [b, assign(n, sub(lookup(n), nat(x)))])]) 
 }
}
\end{rascal}
\end{quote}
\caption{Desugaring \oberon FOR-loops to WHILE-loops in
  \Rascal\label{LST:for2while}} 
\end{listing}

The desugaring of the CASE-statement to nested IF-statements is
implemented in a similar way.


\subsection{Code generation}

\noindent Code generation to C works in two phases. The first phase consists of a source-to-source transformation on \oberon to make all identifiers unique and to move all (nested) procedures to top level. The second phase unparses such normalized \oberon programs to flat C code.

%Flattening \oberon programs heavily uses Rascal's support for tree %traversal using the \texttt{visit} statement.....

Although it would have been possible to define an abstract
syntax for C and then use \Rascal's Box formatting to obtain a textual
representation suitable for compiling to machine code, we have instead
opted for a simpler, more pragmatic approach using string
templates. In \Rascal string literals can be interpolated with
expressions, conditional statements (if) and loops (for, while,
do-while). This makes string templates very convenient for generating
code. Moreover, using the explicit margin markers (single quote '),
such templates are convenient to write and the result will be
automatically indented. For instance, Listing~\ref{LST:rascal-stat2c} shows the code to generate
code for WHILE loops and IF statements a C while-loop.
\begin{listing}
\begin{quote}
\begin{rascal}
str stat2c(whileDo(c, b)) = "while (<exp2c(c)>) {
                            '  <stats2c(b)>
                            '}";

str stat2c(ifThen(c, b, ei, ep)) = 
                            "if (<exp2c(c)>) {
                            '  <stats2c(b)>
                            '}<for (<ec, eb> <- ei) {>
                            'else if (<exp2c(ec)>) {
                            '  <stats2c(eb)>
                            '}<}>
                            '<if (ep != []) {>
                            'else {
                            '  <stats2c(ep)>
                            '}<}>";
\end{rascal}
\end{quote}
\caption{Using auto-indenting string templates to generate C-code}
\label{LST:rascal-stat2c}
\end{listing}
In both string templates the statements within statement blocks will
be indented relative to the margin. All whitespace to the left of the
margin is discarded.


% combination of tasks and sublanguages
\subsection{Artifacts}


\begin{table}
\begin{center}\small
\begin{tabular}{|l||r|r|r|r|r||r|}\hline
      & \multicolumn{1}{c|}{T1:} 
      & \multicolumn{1}{c|}{T2:} 
      & \multicolumn{1}{c|}{T3:} 
      & \multicolumn{1}{c|}{T4:} 
      & \multicolumn{1}{c||}{T5:} & \multirow{2}{*}{Total}
       \\
  & parse/pp & bind & check & desugar & compile &
       \\\hline\hline
L1  &    244 & 162 & 146 &-- &   60 &  612   \\\hline
L2  &    80  & 44  & 26 & 39 &  --  &  189   \\\hline
L3  &    73  & 117 & 39 & -- &  106 &  335   \\\hline
L4  &    90  & 67  & 53 & -- & 48   &  258   \\\hline\hline
Total & 487  & 390 & 264 & 39 & 214 & 1394 \\\hline 
\end{tabular}
\end{center}
\caption{Overview of the structure and size (\#SLOC) of the Rascal implementation of each task for each language level.}
\label{TBL:rascalArtifacts}
\end{table}

\noindent The Rascal \oberon implementation consists of the modular implementation of each task for each language level (when appropriate). An overview is shown in Table~\ref{TBL:rascalArtifacts}. Task T1 includes the code for the grammar, AST data type, pretty printing rules and AST normalization. The code for T2 consists of the scope data type and the \irascal{bind} function. T3 is covered by the \irascal{check} function. T4 only applies to L2 and performs the desugaring of FOR and CASE statements. Finally, the code for compilation to C (T5) includes functions for lifting nested procedures and renaming identifier. 

The columns in Table~\ref{TBL:rascalArtifacts} capture functional dependency. For instance, T3 (checking) requires that name analysis has been applied to the AST. The horizontal dimension indicates extension: each higher level (lower in the table) extends the previous level. The extension applies to both grammars, algebraic data types, and functions. 


%     T1: parse/format &  T2: bind & T3: check  & T4: desugar & T5: compile \\
% L1    x   &         &   x   &       &   
% L2    x   &   x     &   x   &       &    
%A2a
% L3    x   &   x     &   x   &       &
%A2b    
% L1    x   &   x     &   x   &   x   &
% L2    x   &         &   x   &   x  &     


\subsection{Conclusions}

\noindent The experience developing \oberon in Rascal has been generally positive. Nevertheless, there are some areas where we think Rascal could still be improved. Below we discuss what we think contributed to our positive experience and identify three areas where Rascal could be better.

Rascal's grammar formalism is very powerful and expressive. As a result the grammar can be written with an eye to the desired abstract syntax: essentially the grammar does not have to be factored in awkward ways to satisfy the parser.

Although tool support for Rascal is currently in an early stage, the Eclipse-based IDE, read-eval-print-loop (REPL) and built-in testing framework are very helpful in developing and debugging language implementations. Furthermore, Rascal includes the tools \textsc{AmbiDexter}~\cite{AmbiDexter} and \textsc{DrAmbiguity}~\cite{DrAmbiguity} for (conservatively) checking (non-)ambiguity, and for diagnosing ambiguous parse forests respectively. These tools have been instrumental in creating correct grammars for \oberon. In fact, \textsc{AmbiDexter} was able to establish that our grammar of \oberon is unambiguous.

Although the AST could easily be automatically derived from concrete syntax trees, the current scheme of using the abstract ADT as a recipe to guide the ``implosion'' process allows additional flexibility.  This allows, for instance, the implosion of lexical tokens (identifiers, integers etc.) to different Rascal primitive types. It also supports skipping over irrelevant chain rules (injections) and flattening of nested lists. 

The feature that makes this kind of guided implosion work, is that Rascal types can be inspected at runtime. This was also essential for letting the pretty printer insert parentheses if and when expressions needed them. Since a Rascal grammar defines a proper type, we could inspect the \oberon grammar itself to derive the precedence relation. 

The name and type analysis tasks are implemented by programming. The powerful pattern matching and traversal features and built-in data structures of Rascal make this quite easy. By breaking a function in separate ``rules'' that dispatch on the relevant AST constructors both analyses as well as code generation could be modularized and extend in later language levels.

With respect to extension, however, we sometimes had to anticipate later extension of the code.  For instance, the Rascal \texttt{extend} mechanism is currently not powerful enough to \textit{override} a previous definition. For instance, if for a certain language extension an existing language construct needs to be reinterpreted, it is currently impossible to do this. In other words, the only extensibility Rascal currently caters for is \textit{syntactic}: you may add another function alternative (e.g., bind, check) for the new construct, but you cannot revisit an existing case. We are currently investigating how function definitions can override a previous definition handling the same case. This would allow calling the previous definition using a special keyword (e.g., \texttt{super}).

Another direction for improvement is concerned with how state and context information currently has to be manually threaded through recursive functions. A restricted form of dynamically scoped variables could eliminate a significant amount of boilerplate and scaffolding code. Such variables would work like function parameters, except that they are not explicitly passed around. For instance, the \irascal{bind} function currently receives the current set of error messages and has to return tuples of an annotated AST and the new set of error messages. With dynamic variables, the set of error messages would be initialized as a local, but dynamically scoped, variable in the first call to \irascal{bind}. Each recursive invocation \irascal{bind} would update that very same variable.

The code generation to C is currently implemented using Rascal's string templates. While very convenient and flexible, there is some overlap in functionality with pretty printing using Box. It would be interesting to see if some of the formatting features of Box could be integrated into the string templates. More generally, we think a better integration of Box into the language as a whole would be make the development of pretty printers much easier. This would involve a much closer integration with the concerete syntax features of Rascal, for two reasons. First, the system could then automatically deal with parenthesis insertion. Second, since the input to the pretty printer is an AST, all comments are lost. In many cases this is unacceptable.

To conclude, our experience implementing \oberon shows that Rascal is a suitable language for prototyping languages in a modular fashion, with relatively little effort. All five tasks across the four language levels have been implemented in under 1500 source lines-of-code. Finally, we have implemented an \oberon IDE using Rascal's lightweight hooks into Eclipse. 

